Dynamic natural language understanding

ABSTRACT

Methods and systems for dynamic natural language understanding. A hierarchical structure of semantic categories is exploited to assist in the natural language understanding. Optionally, the natural language to be understood includes a request.

This application is a continuation of application Ser. No. 11/562,132filed Nov. 21, 2006, which is a continuation of application Ser. No.10/097,537 filed Mar. 13, 2002, which claims the priority of provisionalapplication Ser. No. 60/275,598 filed on Mar. 13, 2001, all of which areincorporated herein by reference.

FIELD OF THE INVENTION

This invention relates to natural language understanding.

BACKGROUND OF THE INVENTION

Natural language understanding systems and methods traditionally usestrict grammar or statistics.

Grammar based natural language understanding systems and methodstypically use a parser to parse a text into a tree, i.e. a hierarchical(“depth”) structure. Elements of the tree are processed in ahierarchical manner, either bottom up or top down. In order to achievesuccessful understanding of the text, the sentence structure/grammargenerally needs to conform to rules, thereby placing constraints on thefreedom of expression of the submitter of the text.

Statistically based natural language understanding systems and methodstypically use many statistical methods including classification tounderstand a text. Freedom of expression by the submitter of the text istherefore enhanced.

Systems of the related art include the following:

U.S. Pat. No. 5,680,511 to Baker, et al. in one aspect, provides wordrecognition systems that operate to recognize an unrecognized orambiguous word that occurs within a passage of words. The system canoffer several words as choice words for inserting into the passage toreplace the unrecognized word. The system can select the best choiceword by using the choice word to extract from a reference source, samplepassages of text that relate to the choice word. For example, the systemcan select the dictionary passage that defines the choice word. Thesystem then compares the selected passage to the current passage, andgenerates a score that indicates the likelihood that the choice wordwould occur within that passage of text. The system can select thechoice word with the best score to substitute into the passage. Thepassage of words being analyzed can be any word sequence including anutterance, a portion of handwritten text, a portion of typewritten textor other such sequence of words, numbers and characters. Alternativeembodiments of the present invention are disclosed which function toretrieve documents from a library as a function of context.

U.S. Pat. No. 5,642,519 to Martin provide a unified grammar for a speechinterpreter capable of real-time speech understanding for userapplications running on a general purpose microprocessor-based computer.The speech interpreter includes a unified grammar (UG) compiler, aspeech recognizer and a natural language (NL) processor. The UG compilerreceives a common UG lexicon and unified grammar description, andgenerates harmonized speech recognition (SR) and NL grammars for thespeech recognizer and natural language processor, respectively. Thelexicon includes a plurality of UG word entries having predefinedcharacteristics, i.e., features, while the UG description includes aplurality of complex UG rules which define grammatically allowable wordsequences. The UG compiler converts the complex UG rules (complex UGrules include augmentations for constraining the UG rules) intopermissible SR word sequences and SR simple rules (simple rules do notinclude any augmentation) for the SR grammar. The SR grammar is acompact representation of the SR word entries corresponding to the UGword entries, permissible SR word sequences and simple SR rulescorresponding to the augmentations of the complex UG rules. The NLgrammar provides the NL processor with NL patterns enabling the NLprocessor to extract the meaning of the validated word sequences passedfrom the speech recognizer.

U.S. Pat. No. 5,991,712 also to Martin teaches that improved wordaccuracy of speech recognition can be achieved by providing a scheme forautomatically limiting the acceptable word sequences. Speech recognitionsystems consistent with the present invention include a lexicon databasewith words and associated lexical properties. The systems receiveexemplary clauses containing permissible word combinations for speechrecognition, and identify additional lexical properties for selectedwords in the lexicon database corresponding to words in the receivedexemplary clauses using lexical property tests of a grammar database.Certain lexical property tests are switchable to a disabled state. Toidentify the additional lexical properties, the exemplary clauses areparsed with the switchable lexical property tests disabled to produce anindex of the lexical properties corresponding to the exemplary clauses.The lexicon database is updated with the identified additional lexicalproperties by assigning the lexical properties to the correspondingwords of the lexicon database. The grammar database is compiled with thelexical property tests enabled and the lexicon database with theassigned lexical properties to produce a grammar that embodiesconstraints of the lexical property tests and the lexical properties.

U.S. Pat. No. 5,918,222 to Fukui, et al. teaches a data storage meansfor storing data in a predetermined information form. An informationretrieval means retrieves the data stored in the data storage means. Areception means receives an information disclosure demand from ademander, a response rule storage means for storing general knowledgefor generating a response responding to the demander and a personalrelationship information associated with a unique personal relationshipbetween a user having the data on an information provider side and auser on an information demander side. A response plan formation means,responsive to the demand received by the reception means, plans aresponse for exhibiting, to the information demander, data obtained bycausing the retrieval means to retrieve the data stored in the datastorage means on the basis of the knowledge and the personalrelationship information stored in the response rule storage means. Aresponse generation means generates the response to the informationdemander in accordance with the plan formed by the response planformation means.

U.S. Pat. No. 5,987,404 to Della Pietra, et. al. proposes usingstatistical methods to do natural language understanding. The key notionis that there are “strings” of words in the natural language, thatcorrespond to a single semantic concept. One can then define analignment between an entire semantic meaning (consisting of a set ofsemantic concepts), and the English. This is modeled using P(E,A|S). Onecan model p(S) separately. This allows each parameter to be modeledusing many different statistical models.

U.S. Pat. No. 5,576,954 to Driscoll teaches a procedure for determiningtext relevancy and can be used to enhance the retrieval of textdocuments by search queries. This system helps a user intelligently andrapidly locate information found in large textual databases. A firstembodiment determines the common meanings between each word in the queryand each word in the document. Then an adjustment is made for words inthe query that are not in the documents. Further, weights are calculatedfor both the semantic components in the query and the semanticcomponents in the documents. These weights are multiplied together, andtheir products are subsequently added to one another to determine a realvalue number (similarity coefficient) for each document. Finally, thedocuments are sorted in sequential order according to their real valuenumber from largest to smallest value. Another, embodiment is forrouting documents to topics/headings (sometimes referred to asfiltering). Here, the importance of each word in both topics anddocuments are calculated. Then, the real value number (similaritycoefficient) for each document is determined. Then each document isrouted one at a time according to their respective real value numbers toone or more topics. Finally, once the documents are located with theirtopics, the documents can be sorted. This system can be used to searchand route all kinds of document collections, such as collections oflegal documents, medical documents, news stories, and patents.

U.S. Pat. No. 5,642,502 also to Driscoll teaches a system and method forretrieving relevant documents from a text data base collection comprisedof patents, medical and legal documents, journals, news stories and thelike. Each small piece of text within the documents such as a sentence,phrase and semantic unit in the data base is treated as a document.Natural language queries are used to search for relevant documents fromthe data base. A first search query creates a selected group ofdocuments. Each word in both the search query and in the documents aregiven weighted values. Combining the weighted values creates similarityvalues for each document which are then ranked according to theirrelevant importance to the search query. A user reading and passingthrough this ranked list checks off which documents are relevant or not.Then the system automatically causes the original search query to beupdated into a second search query which can include the same words,less words or different words than the first search query. Words in thesecond search query can have the same or different weights compared tothe first search query. The system automatically searches the text database and creates a second group of documents, which as a minimum doesnot include at least one of the documents found in the first group. Thesecond group can also be comprised of additional documents not found inthe first group. The ranking of documents in the second group isdifferent than the first ranking such that the more relevant documentsare found closer to the top of the list.

U.S. Pat. No. 5,893,092 also to Driscoll teaches a search system andmethod for retrieving relevant documents from a text data basecollection comprised of patents, medical and legal documents, journals,news stories and the like. Each small piece of text within the documentssuch as a sentence, phrase and semantic unit in the data base is treatedas a document. Natural language queries are used to search for relevantdocuments from the data base. A first search query creates a selectedgroup of documents. Each word in both the search query and in thedocuments are given weighted values. Combining the weighted valuescreates similarity values for each document which are then rankedaccording to their relevant importance to the search query. A userreading and passing through this ranked list checks off which documentsare relevant or not. Then the system automatically causes the originalsearch query to be updated into a second search query which can includethe same words, less words or different words than the first searchquery. Words in the second search query can have the same or differentweights compared to the first search query. The system automaticallysearches the text data base and creates a second group of documents,which as a minimum does not include at least one of the documents foundin the first group. The second group can also be comprised of additionaldocuments not found in the first group. The ranking of documents in thesecond group is different than the first ranking such that the morerelevant documents are found closer to the top of the list.

U.S. Pat. No. 6,088,692 also to Driscoll teaches a natural languagesearch system and method for retrieving relevant documents from a textdata base collection comprised of patents, medical and legal documents,journals, news stories and the like. Each small piece of text within thedocuments such as a sentence, phrase and semantic unit in the data baseis treated as a document. Natural language queries are used to searchfor relevant documents from the data base. A first search query createsa selected group of documents. Each word in both the search query and inthe documents are given weighted values. Combining the weighted valuescreates similarity values for each document which are then rankedaccording to their relevant importance to the search query. A userreading and passing through this ranked list checks off which documentare relevant or not. Then the system automatically causes the originalsearch query to be updated into a second search query which can includethe same words, less words or different words than the first searchquery. Words in the second search query can have the same or differentweights compared to the first search query. The system automaticallysearches the text data base and creates a second group of documents,which as a minimum does not include at least one of the documents foundin the first group. The second group can also be comprised of additionaldocuments not found in the first group. The ranking of documents in thesecond group is different than the first ranking such that the morerelevant documents are found closer to the top of the list.

U.S. Pat. No. 5,694,592 also to Driscoll teaches a procedure fordetermining text relevancy that can be used to enhance the retrieval oftext documents by search queries. This system helps a user intelligentlyand rapidly locate information found in large textual databases. A firstembodiment determines the common meanings between each word in the queryand each word in the document. Then an adjustment is made for words inthe query that are not in the documents. Further, weights are calculatedfor both the semantic components in the query and the semanticcomponents in the documents. These weights are multiplied together, andtheir products are subsequently added to one another to determine a realvalue number (similarity coefficient) for each document. Finally, thedocuments are sorted in sequential order according to their real valuenumber from largest to smallest value. Another, embodiment is forrouting documents to topics/headings (sometimes referred to asfaltering). Here, the importance of each word in both topics anddocuments are calculated. Then, the real value number (similaritycoefficient) for each document is determined. Then each document isrouted one at a time according to their respective real value numbers toone or more topics. Finally, once the documents are located with theirtopics, the documents can be sorted. This system can be used to searchand route all kinds of document collections, such as collections oflegal documents, medical documents, news stories, and patents.

U.S. Pat. No. 6,138,085 to Richardson, et al. teaches a facility fordetermining, for a semantic relation that does not occur in a lexicalknowledge base, whether this semantic relation should be inferreddespite its absence from the lexical knowledge base. This semanticrelation to be inferred is preferably made up of a first word, a secondword, and a relation type relating the meanings of the first and secondwords. In a preferred embodiment, the facility identifies a salientsemantic relation having the relation type of the semantic relation tobe inferred and relating the first word to an intermediate word otherthan the second word. The facility then generates a quantitative measureof the similarity in meaning between the intermediate word and thesecond word. The facility further generates a confidence weight for thesemantic relation to be inferred based upon the generated measure ofsimilarity in meaning between the intermediate word and the second word.The facility may also generate a confidence weight for the semanticrelation to be inferred based upon the weights of one or more pathsconnecting the first and second words

U.S. Pat. No. 5,675,710 to Lewis teaches a method and apparatus fortraining a text classifier. A supervised learning system and anannotation system are operated cooperatively to produce a classificationvector which can be used to classify documents with respect to a definedclass. The annotation system automatically annotates documents with adegree of relevance annotation to produce machine annotated data. Thedegree of relevance annotation represents the degree to which thedocument belongs to the defined class. This machine annotated data isused as input to the supervised learning system. In addition to themachine annotated data, the supervised learning system can also receivemanually annotated data and/or a user request. The machine annotateddata, along with the manually annotated data and/or the user request,are used by the supervised learning system to produce a classificationvector. In one embodiment, the supervised learning system comprises arelevance feedback mechanism. The relevance feedback mechanism isoperated cooperatively with the annotation system for multipleiterations until a classification vector of acceptable accuracy isproduced. The classification vector produced by the invention is theresult of a combination of supervised and unsupervised learning

U.S. Pat. No. 6,311,152 to Bai, et. al teaches a system (100, 200) fortokenization and named entity recognition of ideographic language. Inthe system, a word lattice is generated for a string of ideographiccharacters using finite state grammars (150) and a system lexicon (240).Segmented text is generated by determining word boundaries in the stringof ideographic characters using the word lattice dependent upon acontextual language model (152A) and one or more entity language models(152B). One or more named entities is recognized in the string ofideographic characters using the word lattice dependent upon thecontextual language model (152A) and the one or more entity languagemodels (152B). The contextual language model (152A) and the one or moreentity language models (152B) are each class-based language models. Thelexicon (240) includes single ideographic characters, words, andpredetermined features of the characters and words.

What is needed in the art is a method and system for understandingnatural language that includes inter alia statistical steps and elementswhich also take advantage of hierarchical-structure. What is also neededin the art is a system and method where the extraction of one part of atext which belongs to one semantic category assists in the extraction ofanother part which belongs to a semantic category of a differenthierarchical level. In addition, what is needed in the art is a methodand system for understanding natural language where later steps of theprocess are affected based on the results of earlier steps, therebyintroducing a dynamic aspect to the method and system.

SUMMARY OF THE INVENTION

According to the present invention, there is provided a method for usein a method for understanding a natural language text, comprisingperforming the following selectively in a statistical manner: attemptingto extract at least one value belonging to a semantic category from anatural language text or a form thereof; and if a result of theattempting complies with a predetermined criterion, attempting toextract, based on the result, at least one value belonging to anothersemantic category of a different hierarchical level than the semanticcategory, else performing at least one action from a group of actionsincluding: asking a submitter of the text a question whose contentdepends on the result and giving up on understanding the naturallanguage text.

In one embodiment, the predetermined criterion is at least one from agroup including: at least one value for the semantic category wasextracted, only one value for the semantic category was extracted, oneof the at least one value extracted for the semantic category isselected based on a grade thereof, a correct number of values for thesemantic category were extracted, a correct number of values for thesemantic category are selected based on grades thereof from among the atleast one value extracted for the semantic category, at least somevalues belonging to other previously extracted at least one semanticcategory are appropriate for at least one value extracted for thesemantic category, values belonging to other previously extracted atleast one semantic category are appropriate for only one value extractedfor the semantic category, the semantic category is a particularsemantic category where an unlimited number of extracted values isallowed, it is desired to process in parallel more than one extractedvalue for the semantic category, there is a default value correspondingto each required value for the semantic category which was notextracted, there is only one possible value for the semantic category,and there is only a correct number of possible values for the semanticcategory.

According to the present invention, there is also provided, a method forunderstanding a natural language text, comprising: receiving a naturallanguage text; processing each at least two semantic categories, theeach on a different hierarchical level, by performing the followingselectively in a statistical manner: (i) attempting to determine atleast one value belonging to the each semantic category throughextraction, wherein if the each semantic category is not a firstprocessed of the at least two semantic categories, then the attemptingis based on results of previously processed semantic categories, and(ii) if the each semantic category is not a last processed of the atleast two semantic categories and a result of the attempting does notcomply with a predetermined criterion, dialoging with a submitter of thetext and receiving at least one answer from the submitter, wherein atleast one value determined from the at least one answer augments theresult so as to comply with the predetermined criterion and allowextraction attempts for other of the at least two semantic categories tobe subsequently processed; and evaluating values determined for the atleast two semantic categories with respect to one another to determinewhether the values are sufficient to understand the text, and if thevalues are not sufficient: dialoging with the submitter, receiving atleast one answer from the submitter, determining from the at least oneanswer at least one value belonging to at least one of the at least twosemantic categories, the at least one value in conjunction with earlierdetermined values being sufficient to understand the text.

According to the present invention there is further provided a methodfor training at least two classifiers to understand a natural languagetext, comprising: introducing entries into a database, the entriesbelonging to at least two semantic categories of different hierarchicallevels; defining examples of natural language texts, wherein at leastsome of the examples include embedded syntactic tokens based on theentries; and training at least two classifiers for the at least twosemantic categories using the examples or a form thereof.

According to the present invention, there is provided a module for usein a system for natural language understanding, comprising: at least oneclassifier or pseudo classifier configured to extract values belongingto a semantic category from a natural language text or a form thereof;and an action resolver configured if a result of extracting values ofthe semantic category complies with a predetermined criterion to employbased on the result at least one classifier or pseudo classifier toextract values belonging to another semantic category of a differenthierarchical level, and configured if the result does not comply with apredetermined criterion to perform at least one action from a group ofactions including: employing based on the result a dialog managementmodule and giving up on understanding the natural language text.

According to the present invention, there is also provided: a system fornatural language understanding, comprising: at least two classifiers orpseudo classifiers configured to extract values belonging to at leasttwo semantic categories on different hierarchical levels from a naturallanguage text or a form thereof; a dialog management module configuredto dialog with a submitter of the natural language text; at least oneevaluation module configured to evaluate values belonging to the atleast two semantic categories; and an action resolver configured tocause the text to be understood by (i) employing, if a result ofextracting values of a semantic category complies with a predeterminedcriterion and the semantic category is not a last to be processedsemantic category, a classifier or pseudo classifier based on the resultto extract values belonging to another semantic category, by (ii)employing, if the result does not comply with a predetermined criterionand the semantic category is not a last to be processed semanticcategory, a dialog management module and then employing, based on theresult as augmented by at least one answer received from the submitterby the dialog management module, a classifier or pseudo classifier toextract values belonging to another semantic category, and by (iii)employing the evaluation module to evaluate the values of the at leasttwo semantic categories in relation to one another in order to determineif the values are sufficient to understand the text and if the valuesare not sufficient employing the dialog management module to determineat least one value, the at least one value in conjunction with thevalues being sufficient to understand the text.

According to the present invention, there is further provided a systemfor training classifiers for natural language understanding, comprising:a real time database including entries related to semantic categories onat least two different hierarchical levels; classifiers for the semanticcategories; and a knowledge work tool configured to develop syntactictokens from the entries, embed the tokens in examples and train theclassifiers at least partially on the examples.

According to the present invention there is still further provided amethod for understanding a natural language text, comprising performingthe following in a selectively statistical manner: receiving a naturallanguage text; extracting at least one parameter value from the text ora form thereof; identifying at least one parameter type related to eachextracted parameter value; providing at least one restatement of thereceived text, each at least one restatement having embedded within, atleast one of the identified parameter types; extracting at least oneoverall category value from the at least one restatement or a formthereof; selecting a subcategory extractor corresponding to one of theextracted at least one overall category, and using the selectedsubcategory extractor to extract at least one subcategory value;choosing one of the at least one extracted subcategory values;evaluating the at least one identified parameter type in relation to thechosen subcategory value; and concluding that the natural language textis understood.

According to the present invention, there is yet further provided systemfor understanding a natural language text, comprising: one classifierconfigured to extract an overall category value from a natural languagetext or a form thereof; a different classifier corresponding to eachoverall category value configured to extract subcategory values from anatural language text or a form thereof, one classifier configured toextract parameter values from a natural language text or a form thereof;a dialog management module configured to dialog with a submitter of thenatural language text; at least one evaluation component configured toevaluate extracted values; and an action resolver configured to employdifferent parts of the system in turn in order to understand the naturallanguage text, including employing the one classifier for parametervalues before the one overall category classifier and employing theoverall category classifier before the corresponding subcategoryclassifier.

According to the present invention, there is provided a program storagedevice readable by machine, tangibly embodying a program of instructionsexecutable by the machine to perform method steps for use in a methodfor understanding a natural language text, comprising performing thefollowing selectively in a statistical manner: attempting to extract atleast one value belonging to a semantic category from a natural languagetext or a form thereof; and if a result of the attempting complies witha predetermined criterion, attempting to extract, based on the result,at least one value belonging to another semantic category of a differenthierarchical level than the semantic category, else performing at leastone action from a group of actions including: asking a submitter of thetext a question whose content depends on the result and giving up onunderstanding the natural language text.

According to the present invention, there is also provided a computerprogram product comprising a computer useable medium having computerreadable program code embodied therein for use in a computer programproduct comprising: computer readable program code for causing thecomputer to perform the following selectively in a statistical manner:computer readable program code for causing the computer to attempt toextract at least one value belonging to a semantic category from anatural language text or a form thereof; and computer readable programcode for causing the computer if a result of the attempting complieswith a predetermined criterion to attempt to extract, based on theresult, at least one value belonging to another semantic category of adifferent hierarchical level than the semantic category, else performingat least one action from a group of actions including: asking asubmitter of the text a question whose content depends on the result andgiving up on understanding the natural language text.

According to the present invention, there is further provided a programstorage device readable by machine, tangibly embodying a program ofinstructions executable by the machine to perform method steps forunderstanding a natural language text, comprising: receiving a naturallanguage text; processing each at least two semantic categories, theeach on a different hierarchical level, by performing the followingselectively in a statistical manner: (i) attempting to determine atleast one value belonging to the each semantic category throughextraction, wherein if the each semantic category is not a firstprocessed of the at least two semantic categories, then the attemptingis based on results of previously processed semantic categories, and(ii) if the each semantic category is not a last processed of the atleast two semantic categories and a result of the attempting does notcomply with a predetermined criterion, dialoging with a submitter of thetext and receiving at least one answer from the submitter, wherein atleast one value determined from the at least one answer augments theresult so as to comply with the predetermined criterion and allowextraction attempts for other of the at least two semantic categories tobe subsequently processed; and evaluating values determined for the atleast two semantic categories with respect to one another to determinewhether the values are sufficient to understand the text, and if thevalues are not sufficient: dialoging with the submitter, receiving atleast one answer from the submitter, determining from the at least oneanswer at least one value belonging to at least one of the at least twosemantic categories, the at least one value in conjunction with earlierdetermined values being sufficient to understand the text.

According to the present invention, there is still further provided acomputer program product comprising a computer useable medium havingcomputer readable program code embodied therein for understanding anatural language text, the computer program product comprising: computerreadable program code for causing the computer to receive a naturallanguage text; computer readable program code for causing the computerto process each at least two semantic categories, the each on adifferent hierarchical level, by performing the following selectively ina statistical manner: computer readable program code for causing thecomputer to

(i) attempt to determine at least one value belonging to the eachsemantic category through extraction, wherein if the each semanticcategory is not a first processed of the at least two semanticcategories, then the attempting is based on results of previouslyprocessed semantic categories, and computer readable program code forcausing the computer to (ii) if the each semantic category is not a lastprocessed of the at least two semantic categories, and a result of theattempting does not comply with a predetermined criterion, dialog with asubmitter of the text and receive at least one answer from thesubmitter, wherein at least one value determined from the at least oneanswer augments the result so as to comply with the predeterminedcriterion and allow extraction attempts for other of the at least twosemantic categories to be subsequently processed; and computer readableprogram code for causing the computer to: evaluate values determined forthe at least two semantic categories with respect to one another todetermine whether the values are sufficient to understand the text, andif the values are not sufficient: dialog with the submitter, receive atleast one answer from the submitter, determine from the at least oneanswer at least one value belonging to at least one of the at least twosemantic categories, the at least one value in conjunction with earlierdetermined values being sufficient to understand the text.

According to the present invention, there is provided a program storagedevice readable by machine, tangibly embodying a program of instructionsexecutable by the machine to perform method steps for training at leasttwo classifiers to understand a natural language text, comprising:introducing entries into a database, the entries belonging to at leasttwo semantic categories of different hierarchical levels; definingexamples of natural language texts, wherein at least some of theexamples include embedded syntactic tokens based on the entries; andtraining at least two classifiers for the at least two semanticcategories using the examples or a form thereof.

According to the present invention there is also provided a computerprogram product comprising a computer useable medium having computerreadable program code embodied therein for training at least twoclassifiers to understand a natural language text, the computer programproduct comprising: computer readable program code for causing thecomputer to introduce entries into a database, the entries belonging toat least two semantic categories of different hierarchical levels;computer readable program code for causing the computer to defineexamples of natural language texts, wherein at least some of theexamples include embedded syntactic tokens based on the entries; andcomputer readable program code for causing the computer to train atleast two classifiers for the at least two semantic categories using theexamples or a form thereof.

According to the present invention, there is further provided a programstorage device readable by machine, tangibly embodying a program ofinstructions executable by the machine to perform method steps forunderstanding a natural language text, comprising performing thefollowing in a selectively statistical manner: receiving a naturallanguage text; extracting at least one parameter value from the text ora form thereof; identifying at least one parameter type related to eachextracted parameter value; providing at least one restatement of thereceived text, each at least one restatement having embedded within, atleast one of the identified parameter types; extracting at least oneoverall category value from the at least one restatement or a formthereof; selecting a subcategory extractor corresponding to one of theextracted at least one overall category, and using the selectedsubcategory extractor to extract at least one subcategory value;choosing one of the at least one extracted subcategory values;evaluating the at least one identified parameter type in relation to thechosen subcategory value; and concluding that the natural language textis understood.

According to the present invention there is yet further provided, acomputer program product comprising a computer useable medium havingcomputer readable program code embodied therein for understanding anatural language text, the computer program product comprising: computerreadable program code for causing the computer to perform the followingin a selectively statistical manner: computer readable program code forcausing the computer to receive a natural language text; computerreadable program code for causing the computer to extract at least oneparameter value from the text or a form thereof, computer readableprogram code for causing the computer to identify at least one parametertype related to each extracted parameter value; computer readableprogram code for causing the computer to provide at least onerestatement of the received text, each at least one restatement havingembedded within, at least one of the identified parameter types;computer readable program code for causing the computer to extract atleast one overall category value from the at least one restatement or aform thereof, computer readable program code for causing the computer toselect a subcategory extractor corresponding to one of the extracted atleast one overall category, and use the selected subcategory extractorto extract at least one subcategory value; computer readable programcode for causing the computer to choose one of the at least oneextracted subcategory values; computer readable program code for causingthe computer to evaluate the at least one identified parameter type inrelation to the chosen subcategory value; and computer readable programcode for causing the computer to conclude that the natural language textis understood.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to understand the invention and to see how it may be carriedout in practice, a preferred embodiment will now be described, by way ofnon-limiting example only, with reference to the accompanying drawings,in which:

FIG. 1 is a block diagram of a system for understanding a naturallanguage input and optionally executing a request included therein,according to a preferred embodiment of the present invention;

FIG. 2 is a sample screen of a knowledge base work tool, according to apreferred embodiment of the present invention;

FIG. 3 is a sample screen showing the processing of an active browsingscript according to a preferred embodiment of the present invention;

FIG. 4 is a block diagram of a module for understanding a naturallanguage text, according to a preferred embodiment of the presentinvention;

FIG. 5 is a flow chart of a method for understanding a natural languagetext, according to a preferred embodiment of the present invention

FIG. 6 is a flow chart of a method for evaluating extraction results,according to a preferred embodiment of the present invention;

FIGS. 7A and 7B show a sequence for employing different modules of thenatural language module, according to a preferred embodiment of thepresent invention;

FIG. 8 is a flow chart for preparing a text for extraction, according toa preferred embodiment of the present invention;

FIG. 9 is a flow chart for selecting a classifier or pseudo classifierbased on previous extraction results, according to a preferredembodiment of the present invention;

FIG. 10 is a flow chart for interaction with the submitter of a naturallanguage text, according to a preferred embodiment of the presentinvention;

FIG. 11 is an entity-relationship (ER) diagram of a real time database,according to a preferred embodiment of the present invention; and

FIG. 12 is a flow chart of a method for training a natural languagemodule, according to a preferred embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The preferred embodiment relates to a system and method forunderstanding natural language.

In certain preferred embodiments of the present invention, the naturallanguage to be understood includes a request. However, the invention isnot limited to understanding requests and also applies to understandingnatural language inputs which do not include a request.

All examples given below are non-limiting illustrations of the inventiondescribed and defined herein.

FIG. 1 is an example of a block diagram of a system 100 forunderstanding natural language and if the natural language includes arequest, optionally executing the request, according to a preferredembodiment of the present invention. System 100 assumes remote accessthrough a network, such as the Internet, but it should be evident thatlocal access is within the scope of the invention.

It is assumed that a user inputs natural language through a client 110.Client 110 is shown here as a Laptop Computer however it should beevident that client 110 can be any input device, for example computers,PDAs (Personal Digital Assistants), phones, cellular phones, cellularphones with SMS or WAP capabilities, fax, scanner, etc. Depending on thetype of client 110 used, pre-handling of the input may be necessary, forexample speech to text conversion, optical character recognition etc.These pre-handling techniques are well known in the art. An optionalnetwork profiler 120 manages remote networks (not shown), controllingnetwork properties as well as the inflow and outflow of requests to andfrom the remote systems. An optional control server 130 manages theconnection between client 110 and the rest of system 100, and also theinternal connections within the rest of system 100.

A natural language understanding (NLU) server 140 includes the softwarefor understanding the natural language. In order for the software on NLUserver 140 to be able to comprehend the natural language, a preparationstage is necessary which includes for example the implementation oftraining activity. The preparation stage can be conducted, for example,using a knowledge builder work tool 150. A sample screen of work tool150 is shown in FIG. 2.

Assuming that the input is a natural language request, a requestimplementation server 160, allows the implementation of the understoodrequest. For example if request implementation server 160 is an activebrowsing server, implementation can occur through control of Internetsites automatically during runtime. In one preferred embodiment, everyrequest supported by system 100 is represented by a script code ofseveral lines that defines how and where the request should be carriedout. The active browsing script can be prepared, for example, using anactive browsing studio work tool 170. FIG. 3 shows a sample screen ofthe processing of an active browsing script so as to carry out therequest. (Browsing work tool 170 in some preferred embodiments may alsobe used in building a real time database 445 as explained below withreference to FIG. 11).

In a non-network system, NLU server 140 and request implementationserver 160 in certain preferred embodiments are replaced by naturallanguage understanding (NLU) module 140 and request implementationmodule 160, respectively. NLU module or server 140 and requestimplementation server or module 160 in certain preferred embodiments areseparately implemented so that for example, NLU module/server 140 can beused with another means of request implementation or even with norequest implementation means. For example, even if the natural languageis assumed to include a request, the request may be understood withoutbeing implemented. In preferred embodiments where the user input is nota request, request implementation means may be unnecessary.

Similarly, in certain preferred embodiments, other means of naturallanguage understanding could be used instead of NLU module/server 140 ina system with request implementation module/server 160. In otherpreferred embodiments, NLU module/server 140 and implementationmodule/server 160 can be integrated into one unit or separated into morethan two units.

For the purposes of the description below the term “module” as in NLUmodule and request implementation module is assumed to refer to bothmodules and servers, which may form part of network or non-networksystems.

FIG. 4 shows an example of NLU module 140, according to a preferredembodiment of the present invention. It should be evident that themodules shown in FIG. 4 as forming part of NLU module 140 and discussedbelow can be integrated or divided into a smaller or larger number ofmodules. The actual separation of the functions of NLU module 140 intothe modules shown in FIG. 4 is for ease of understanding only. In onepreferred embodiment of the invention, the modules shown on the bottomof FIG. 4 are associated with the online (i.e. using) stage and themodules shown on the top of FIG. 4 are associated with the offline(training) stage. FIG. 4 is discussed below in conjunction with flowcharts illustrating the methods for using and training module 140. Theorder of the steps in one or more of the methods illustrated in the flowcharts may be varied in other preferred embodiments. In other preferredembodiments, some steps in one or more of the methods in the flow chartsmay be omitted and/or additional steps may be added.

An example of the overall method for understanding the submission from auser is shown in FIG. 5, according to a preferred embodiment of thepresent invention. The method shown in FIG. 5 enables understanding ofthe text, without compelling the text to comply to a pre-definedgrammar.

User input is received (step 510) from a submitter, for example throughclient 110. As mentioned above, input can optionally include a request.It is assumed that any necessary pre-handling of the input has alreadybeen performed as explained above so that the input is received bymodule 140 in a format compatible with module 140. In one preferredembodiment, the format is ASCII. Herein below, once the input has beenpre-handled to a format compatible with module 140, the input is alsoreferred to as “text”.

The next step is preparing the text, if necessary, for processing (step512) using a text pre-preprocessing module 435. An action resolvermodule 410 decides which classifier module (also sometimes termedextractor) 420 or pseudo-classifier static component 425 to employ (step515). Each classifier or group of classifiers 420 extracts valuesbelonging to a different semantic category. One or morepseudo-classifier static components 425 extract phrases that can not belearned or do not need to be learned (as will be explained furtherbelow) belonging to one or more semantic categories. Selected classifier420 or pseudo classifier static component 425 is employed on the text(step 520), and the results of the extraction are evaluated by actionresolver 410 (step 525), as will be described below. If the results ofselected classifier 420 are sufficient to understand the text (i.e. itis concluded that the text is understood) (step 535), the results areoptionally output (for example to the submitter, or to control server130 and from there to request implementation module 160) in step 530.Outputting the results is one possible way of indicating that the textis understood. Alternatively, other indications of understanding can beused, for example an indication that a request optionally included inthe input was implemented.

If the results are insufficient to understand the text at this stage,but based on the results, a further extraction can be performed by aclassifier 420 or pseudo classifier 425 (step 540), action resolver 410prepares the text, if necessary, for further extraction (step 512) andaction resolver 410 chooses the next classifier 420 or the next pseudoclassifier 425 for the next semantic category to extract (step 515).Alternatively, a further extraction may not be able to be performed andthe results are insufficient to understand the text. This situation mayoccur, for example, if one or more of classifiers 420 could not classifythe text into any one of the possible semantic meanings that classifier420 knows. In this case, NLU module 140 may be considered to have failedto understand the text and may stop any further processing (step 560).The results of the failure can be optionally communicated to thesubmitter. Alternatively and more preferably, action resolver module 410may dialog with the submitter in step 545, and receive submitterresponse in step 550. Once the response is received a further extractionmay be performed on the submitter response in step 520 using theclassifier 420 or pseudo classifier 425 for the same semantic category,or a classifier 420/pseudo classifier 425 for another previouslyextracted semantic category.

It should be noted that the method described above with reference toFIG. 5 includes some steps performed in a selectively statisticalmanner. For example, when a classifier 420 is used in step 520, the stepis a statistically based step, whereas when a pseudo classifier 425 isused in step 520, the step is typically non-statistically based.

Classifiers are well known in the art. An example of a public domainalgorithm which can be used by classifiers 420 of this inventionincludes Naive-bayes text-classification developed by Carnegie MellonUniversity and available on the world wide web atwww.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html. This publicdomain algorithm is based on “Machine Learning” by Tom Mitchell, Chapter6, McGraw Hill 1997, ISBN 0070428077.

A pseudo classifier is a type of static component 425. Pseudoclassifiers do not need to be trained. Non limiting examples of pseudoclassifiers 425 include time extractors and currency extractors, whichdetect and extract time phrases and currency phrases respectively. Incertain preferred embodiments, a data structure for time is used whichkeeps values for seconds, minutes, days, months, years, etc. Timeextractors can therefore translate phrases such as “next Tuesday” intonext Tuesday's date, or translate phrases like “for three days”, “everySunday” etc. In certain preferred embodiments, the data structure formoney is in the form of #x.yy, where x is an integer and yy is anynumber between 00 and 99 and # can be replaced by any currency type.Pseudo classifiers such as time and currency extractors may in somecases be based on limited strict grammars, suitable only for specifictime or money expressions, and may use well known parsing methods fordetecting and parsing the temporal expressions (time-phrases) orcurrency phrases. In these preferred embodiments, the parsing may bepartial and include a part of the given text that could be parsed basedon the limited grammars and later transformed to a data structure thatcan hold the time or currency expressions. An example of grammar basedtemporal expression parsing (which as well known in the art can beeasily modified for currency phrases) can be found atodur.let.rug.nl/˜vannoord/papers/yearbook/node2.html as a part of a workcalled “Grammatical Analysis in a Spoken Dialogue System” by GosseBouma, Rob Koeling, Mark-Jan Nederhof and Gertjan Van Noord.

A semantic category should be understood for the purpose of thedescription below to be a grouping of values including at least onecommon property which distinguishes those values from values in othersemantic categories that are defined for a particular embodiment.

In preferred embodiments of the present invention, there is ahierarchical structure among the semantic categories which is exploitedwhen understanding the text. In certain preferred embodiments, there arethree semantic categories: overall category (highest level), subcategory(medium level), and parameter values (lowest level). As an example oneor more classifier 420 or pseudo classifiers 425 may extract value(s)belonging to the overall category. The overall category value in thisexample can be considered the domain or topic of interest of the text.Continuing with the example, one or more classifier 420 or pseudoclassifiers 425 may extract value(s) belonging to the subcategory of theoverall category, such as operations related to the overall topic ofinterest. Continuing with the same example, one or more classifier 420or pseudo classifiers 425 may extract parameter value(s). In successfulextractions for certain preferred embodiments, the extracted parametervalues are of the parameter types required by the extracted subcategoryvalue. In this example, the subcategory values share the common propertyof being subcategory values of overall category values, whereas overallcategory values share the common property of having subcategory values.Also in this example, parameter values share the common property ofhaving corresponding parameter types which can be accepted bysubcategory values, whereas subcategory values share the common propertyof typically accepting parameter values of particular parameter types(It should be noted that in some cases no parameter types are definedfor a particular subcategory value).

In other preferred embodiments, there may be fewer or more semanticcategories in a hierarchical structure. For example, there may be anoverall category, a pre-subcategory, a subcategory, and parametervalues.

In some preferred embodiments, the text may include more than onesubtext (for example more than one request) with each subtextrepresented by a separate hierarchical structure that can be processedeither in parallel or sequentially. For the sake of simplicity, it isassumed below that the text can be represented by a single hierarchicalstructure.

It should be noted that the example given above of a possiblehierarchical structure (i.e. overall category such as domain as thehighest level, subcategory such as operation as the medium level, andparameter values as the lowest level) conforms with one conceptual view(as implemented typically by a system analyst—see below FIG. 11). Inother words, if the following words were presented: currency conversion,banking, depositing, checks, and dollars, a common conceptual view wouldclassify banking as belonging to the highest level category (domain),currency conversion and depositing as belonging to the middle levelcategory (operations), and checks and dollars as belonging to the lowestlevel category (parameter values). As another example, out of thefollowing words: studying, teaching, school, books and tests, schoolwould be considered by many to belong to the highest level category,studying and teaching to the middle level category and books and teststo the lowest level category. The hierarchical structure used in otherpreferred embodiments are by no means bound to this conceptual view andmay embrace other conceptual views.

FIG. 6 illustrates a preferred embodiment of a method for evaluatingresults of the extraction by classifier 420 (corresponding to step 525of FIG. 5). In the earlier step 520 of FIG. 5, classifier 420 for agiven semantic category searches a knowledge base 430 using the textprepared for extraction. (Details on how knowledge base 430 is developedwill be explained below). Classifier 420 returns one or more possiblematches (step 610) and a grade for each match (step 620). Grading ispreferably performed by classifier 420 based on the training undergoneby classifier 420 during the preparation of knowledge-base 430. Theprocess iterates in step 630 until all matches and grades of the matchesare output. In other preferred embodiments, not all matches are outputbut only matches that meet certain criteria, for example the highestgraded matches, the most clustered matches, etc. For example, ifclustered matches are to be output, matches whose grades are within asmall range are identified and considered relevant. Continuing with theexample, if there are ten results whose grades are {9, 8.3, 8.1, 7.9,6.2, 6.1, 6, 4, and 1.2}, the two clustered groups of matches correspondto {8.3 to 7.8} and {6.2 to 6}. The outputted matches may in thisexample be those which correspond to the higher cluster {8.3 to 7.8},perhaps together with other outputted matches, for example the matchcorresponding to the highest grade 9.

The matches are sorted by grade in step 640. Generally, all matches arestored until no longer needed, i.e. until subsequent processing, forexample further extractions, dialoging with submitters, etc. renders thematch unlikely to be the correct match. In other preferred embodiments,the matches may be stored until the final results are output in step530.

If the results are for the first semantic category extracted and morethan one semantic category exists, then the results are consideredinsufficient to understand the text in step 650 (i.e. a ‘no’ answer tostep 535). If there is only one existing semantic category (step 642)then the results are considered sufficient in step 670 (i.e. a ‘yes’answer to step 535). If the results are for subsequent extractedsemantic categories, but the results can not be evaluated in conjunctionwith results from previous semantic category extractions then theresults are considered insufficient in step 650 (corresponding to a ‘no’answer to step 535). An example of a situation where the results can notbe evaluated in conjunction with previous results is if the currentextracted semantic category value(s) is not directly related to any ofthe previously extracted category value(s). To illustrate the example,assume the overall category value extracted is “financial operation” andthe only two possible subcategory values for this overall category valueare “get a stock quote” and “buy stocks”. If neither of thesesubcategory values are extracted, but instead the subcategory value “geta horoscope forecast” is extracted, then the results are consideredinsufficient because the extracted values for the overall category andsubcategory are not related to each other.

If on the other hand, the results can be evaluated in conjunction withresults from previous semantic category extractions, then the resultsare evaluated in conjunction with the results from the previous sets(step 658). For example, the evaluation can determine whether theresults for the current semantic category correspond to the results fromprevious semantic category extractions (see below FIG. 7 for moredetails on a possible evaluation process). If no weighted grade is to becalculated, then if the results are sufficient to understand the text,the method proceeds directly to step 670 (corresponding to a ‘yes’answer to step 535). Otherwise, if no weighted grade is to be calculatedand the results are insufficient, the method proceeds directly to step650 (corresponding to a ‘no’ answer to step 535). The results may beconsidered sufficient to understand the text, for example, if allrequired values for each semantic category are known and the values forthe different semantic categories correspond with one another.

In alternative preferred embodiments, in step 645 if the results are notfor the last extracted semantic category, then the results areconsidered insufficient in step 650. In these embodiments, only when theresults are for the last extracted semantic category does the methodcontinue with step 655 where a decision is made on whether evaluation inconjunction with results from previous semantic category extractions isfeasible.

In alternative preferred embodiments, the meaning of the text is guessedat prior to extracting all semantic categories and/or a final conclusionon a value of a semantic category is reached prior to completing anevaluation of results of that semantic category in conjunction withpreviously extracted semantic categories (In other words, in theseembodiments the threshold of “sufficiency” is lower).

Continuing with the illustrated preferred embodiment, once the resultsfrom all semantic categories are available, an optional weighted grademay be calculated in step 660 as a final test of the combination ofresults from the different semantic categories. The weighted grade isderived using a formula which takes into account the grades achieved bythe current results and the previous semantic category results. Forexample, the formula could be an average with either equal weights foreach semantic category or different weights for each semantic category.Continuing with the example, in preferred embodiments which include anoverall category and subcategory one possible formula might assign theoverall category a weight of 2 and the subcategory a weight of 1. If theweighted grade is high then the results are considered sufficient instep 670 (corresponding to a ‘yes’ answer to step 535). If the weightedgrade is not high enough then if further evaluation i.e. of othercombinations of results from the different semantic categories ispossible, more evaluations are performed in step 658. If no furtherevaluation of other combinations are possible than the results areconsidered insufficient in step 650 (corresponding to a ‘no’ answer tostep 535).

In some preferred embodiments, pseudo classifier 425 results are alsograded. For example the time extractor may in some preferred embodimentsreturn the results in a sequence of decreasing probability. In certainpreferred embodiments with pseudo classifier graded results, the resultsmay be evaluated in conjunction with other results as described in themethod of FIG. 6. In other preferred embodiments, pseudo classifiersreturn either a Boolean true if a match was extracted, or a Booleanfalse if no match was extracted, without any grading.

FIG. 7 illustrates in more detail a preferred embodiment of a possiblesequence followed by action resolver 410 in turning to different modulesof natural language module 140 in order to implement the method of FIG.5. In the preferred embodiment illustrated in FIG. 7, action resolver410 is for example a state automaton. Assume for the sake of the examplehierarchical semantic categories including overall category, subcategoryand parameter values. The sequence shown in FIG. 7, processes thesemantic categories in an order which takes advantage of the hierarchyof the semantic categories, so that results of a semantic category of acertain hierarchical level helps in the processing of a semanticcategory of another hierarchical level. Specifically in this examplethere is assumed to be one classifier 420 for the overall category, aseparate subcategory classifier 420 associated with each overallcategory value, one parameter values classifier 420, and one or moreparameter values pseudo classifiers 425. Continuing with the example,parameter values, belonging to the lowest level semantic category, areextracted first. At least some of the results of the parameter valuesextraction are used to embed tokens into the text for overall categoryextraction, i.e. the highest level semantic category, and forsubcategory extraction, i.e. the medium level semantic category. Theresults of the overall category extraction is used to select asubcategory classifier, i.e. the medium level semantic category. Thehierarchical structure of the semantic categories is thereforeadvantageous to the overall processing.

As the sequence (other than the dynamic features to be discussed below)is pre-programmed by the designer of natural language understandingmodule 140, the sequence shown in FIG. 7 is one of many examples ofpossible sequences.

First, text preprocessing module 435 is employed (step 702 correspondingto step 512). Next, parameter values classifier 420 is employed (step705 corresponding to step 520). Real time database 445 is used toidentify the one or more possible corresponding parameter type(s) foreach extracted parameter value. Then, parameter values pseudoclassifier(s) 425 is employed (step 710 corresponding to step 520) andcorresponding parameter types are identified. Afterwards, textpre-processing module 435 is again employed in preparation for overallclassifier 420 (step 712 corresponding to step 512).

Next, overall category classifier 420 is employed (step 715corresponding to step 520). Optionally (not shown), extracted overallcategory values can be evaluated in relation to extracted parametervalues, for example by employing a first evaluation static component 425to check if the parameter types of the extracted parameter values are insync with subcategory values associated with the extracted overallcategory values.

If no overall category value is found, dialog management module 440 isemployed (step 720 corresponding to step 545). If the overall categoryvalue is ambiguous (i.e. more than one possibility), then in somepreferred embodiments, dialog management module 440 is employed, but inother preferred embodiments, a corresponding subcategory classifier 420is employed for each of the possible overall category values. If dialogmanagement module 440 has been employed, overall category classifier 420is again employed to check the response received from the submitter(step 721 corresponding to step 520). Alternatively, if the response isobvious and does not need to be understood (for example the response isa selection of a multiple choice option) then overall classifier 420does not need to be employed to check the response and step 721 may beskipped.

Examples of situations when the method can proceed directly from step718 to step 722 (without dialoging) include inter-alia: when only oneoverall category value is extracted, when there is a default overallcategory value for the particular embodiment, when there is only oneoverall category value for a particular embodiment, when more than oneoverall category value was extracted but the parameter types of theextracted parameter values point to one of the extracted overallcategory values or to subcategory values associated with one of theextracted overall category values, when more than one overall categoryvalue was extracted but one overall category value can be selected basedon the grades of the extracted overall category values, and when it isdecided to process more than one extracted overall category value inparallel and employ a corresponding subcategory classifier for more thanone overall category values.

Corresponding subcategory classifier 420 is employed (step 722corresponding to step 520). Evaluations are then performed(corresponding to step 658 of FIG. 6) using evaluation static specificcomponents 425.

First specific static component 425 for evaluation is employed whichevaluates all the results of the previous extractions by extractors 420and pseudo extractors 425 (step 725). First specific evaluation staticcomponent 425 checks extracted parameter values against extractedsubcategory value(s) to see if the parameter values (for example basedon the identified parameter types) are suitable for the extractedsubcategory value(s). For example for each extracted subcategory value,first evaluation component 425 may match the parameter type(s)identified for each extracted parameter value with the parameter typesexpected for the extracted subcategory value as predefined in real timedatabase 445. The matching in this example, may result in some expectedparameter types (as predefined) not matched with any extracted parametervalues, matched with exactly one parameter value, or matched with morethan one extracted parameter value.

If based on this evaluation no suitable subcategory value is founddialog management module 440 is employed (step 727 corresponding to step545). Once dialog management module 440 has been employed, sub-categoryclassifier 420 is again employed to check the response received from thesubmitter (step 728 corresponding to step 520). Alternatively, if theresponse is obvious and does not need to be understood (for example theresponse is a selection of a multiple choice option) then sub-categoryclassifier 420 does not need to be employed to check the response andstep 728 may be skipped. First specific evaluation static component 425is again employed in step 729.

Examples of situation when the method can continue directly from step726 to step 735 (without dialoging) include inter-alia: when only onesubcategory value is extracted, when no subcategory value is extractedbut there is a default subcategory value corresponding to the overallcategory value, when there is only one pre-defined subcategory valuecorresponding to the overall category value, when there is more than oneextracted subcategory value but the parameter types of the extractedparameter values point to one of the extracted subcategory values, andwhen there is more than one extracted subcategory value but onesubcategory value can be selected based on the grades of the extractedsubcategory values.

In cases where more than one parameter value of the same parameter typeare defined for the subcategory value and at least one parameter valueof that same parameter type was extracted (step 735), a second specificstatic evaluation (relational) component 425 is employed. Relationshipevaluation component 425 evaluates the correspondence between the atleast one extracted parameter value and the more than one parametervalue defined for the subcategory value (step 740). For example if twonames of cities were extracted for a ticket purchase, the second staticevaluation component 425 recognizes which is a destination and which isa source. Continuing with the example, relational component 425 maysearch real time database 445 for a predefined grammar line or utterancefor example in the form String/s<ParameterType X> String/s

[Arg A]; String/s<ParameterType X> String/s

[Arg b]” which means that when a parameter value of type “ParameterTypeX” is extracted, the parameter value will be matched with the argumentsA and B required by the subcategory value according to the String/s inits context. In this example: “From <ParameterType: City>

Arg: SourceCity; To <ParameterType: City>

Arg: DestCity”, allows the extracted city following the word “from” tobe recognized as the source city and the extracted city following theword “to” to be recognized as the destination city.

A third specific static evaluation component 425 is employed in step742. This third evaluation component 425 checks if parameter valuescorresponding to all parameter types defined for the subcategory valuewere extracted (step 745). For example third evaluation component 425can use a check list against the parameter types predefined for thesubcategory value in real time database 445 Continuing with the example,if no parameter values were extracted for certain parameter typesdefined for the subcategory value, third component 425 can check ifthere are default parameter values which can be assigned or if theparameter types with missing parameter values are optional. Stillcontinuing with the example, if no parameter value or if more than oneparameter value was extracted for a mandatory parameter type (aspredefined) which requires one parameter value and has no default value,then dialoging occurs.

If dialoging is required dialog management module 440 is employed (step750 corresponding to step 545). Parameter values classifier 420 and/orone or more parameter values pseudo classifier(s) 425 is again employedto evaluate the response (step 752 corresponding to step 520)Alternatively, if the response is obvious and does not need to beunderstood (for example the response is a selection of a multiple choiceoption) then parameter values classifier 420 does not need to beemployed to check the response and step 752 may be skipped. Third staticevaluation component 425 is again employed in step 742

Examples of when the dialoging of step 750 need not occur includeinter-alia, when the correct number of parameter values for theparameter types required by the subcategory value were extracted, whenthe subcategory values requires no parameter types, and when theparameter types required by the subcategory value have default values.

After action resolver 410 finishes the sequence of employment of thevarious modules, for example as illustrated in FIG. 7, there should besufficient results to understand the text (corresponding to a ‘yes’answer to step 535). Optionally, a weighted grade can first be evaluated(step 680) as a final test that the results are sufficient.

Note that the three specific evaluation static components 425 mentionedhere are separated in the description for ease of understanding and inalternative preferred embodiments may be combined into less or separatedinto more than three modules.

One of the distinct advantages of the preferred embodiments of thepresent invention are the one or more dynamic aspects of the naturallanguage understanding. Later steps of the process are adaptable basedon the results of earlier steps. The methods illustrated in FIGS. 8, 9,and 10 each include steps which are influenced by the results of theprevious steps. Each of the dynamic aspects illustrated in FIGS. 8, 9,and 10 can be separately implemented, and one or more of the dynamicaspects constitutes a separate preferred embodiment. In FIG. 8, theresults of a previous extraction may be used to develop tokens that maybe embedded in the text used as an input for the next extraction(s). Thetokens become part of the input for the next extraction(s) and aretherefore termed syntactic tokens. In FIG. 9, more than one extractor420 or more than one pseudo extractor 425 is available for the samesemantic category and the selection of extractor 420 or pseudo extractor425 depends on the results of previous extractions. In FIG. 10, thedialog with a submitter can vary based on the results (includingunsuccessful or no results) of previous extractions.

FIG. 8 illustrates a preferred embodiment of a method for preparing thenatural language text for extraction (step 512). The first step is todetermine if the results of one or more previous extractions (by eitherclassifiers 420 or pseudo classifiers 425) can be used to develop one ormore syntactic tokens (step 810). This step is only performed duringcertain subsequent extractions and not for the first extraction.

If one or more syntactic tokens can be developed, the tokens areembedded into one or more restatements of the text (step 815), therebyallowing results of previous extractions to directly influencesubsequent extractions. Depending on the embodiment, in the restatementof the text the embedded tokens can either replace the text parts whichserve as sources for the tokens, or the tokens can supplement those textparts. In certain preferred embodiments, action resolver 410 isresponsible for embedding the tokens, but in other preferredembodiments, other modules such as text pre-processing module 435 embedsthe tokens.

As an example of a token based on an extracted parameter value, assumethe parameter value “crayon” was extracted. A syntactic token of theparameter types (for example “parameter type: writing utensil”)corresponding to the extracted parameter value “crayon” may be developedand embedded in the restatement of the text.

N-grams are constructed for the text or the restatements of the text(including embedded tokens) in step 820, if required. In certainpreferred embodiments, n-grams are required for classifiers and somepseudo classifiers but not for all pseudo classifiers. In certainpreferred embodiments, text pre-processing module 435 constructs then-grams.

N-grams are well known in the art. A non-limiting definition for ann-gram based partially on “Text retrieval from Document Images based onN-gram Algorithm”, Chew Lim Tan, Sam Yuan Sung, Zhaohui Yu, and Yi Xuavailable at http:/citeseer.nj.nec.com/400555.html is: An N-Gram is asequence of N consecutive items of a stream obtained by sliding anN-item wide window over the text one item forward at a time. Everypossible N-Gram is given a number, so called the hash key. How theN-Grams are numbered is not important, as long as each instance of acertain N-Gram is always given the same number, and that distinctnumbers are assigned to different N-Grams.

Preferably the n-grams are sparse n-grams (i.e. also reflect thedistance between words in the text). The use of sparse n-grams in somepreferred embodiments of the present invention is advantageous. Sparsen-grams improve the probability of correct natural languageunderstanding because sparse n-grams takes into account the specificorder of words in a sentence. In preferred embodiments of the presentinvention, sparse n-grams are also trained upon (see below discussionwith regard to FIG. 12).

As an example, the sparse n-grams used are words, doublets, and tripletsin the following form:

(w_(i)w_(j)w_(k), True) where i=j−1=k−2, i=1 . . . n−2

(w_(i)w_(j)w_(k), False) where k>j>i, i=1 . . . n−2

(w_(i)w_(j), True) where i=j−1, i=1 . . . n−1

(w_(i)w_(j), False) where j>i, i=1 . . . n−1

(w_(i), True) where i=1 . . . n

Note that in this example following the comma in each sparse n-gramthere is a True or False indicator. The True/False indicator can be seenas an “Adjacency” indicator. If the words, doublets or triplets arecomposed of words which are adjacent in the text, a “True” indicator isindicated in the sparse n-gram. It should be noted that in this examplein order to gain flexibility if the doublet or triplet is composed ofwords which are adjacent in the text, two sparse n-grams are created,one with a “true” indicator and one with a “false” indicator. The addedflexibility enables a match between the text after n-grams are generatedand the n-grams created for the trained sentence (see below FIG. 12),both if in the trained sentence these words were adjacent or were notadjacent. In this example, if the doublet or triplet is composed ofwords which are not adjacent in the text, a sparse n-gram with a “false”indicator is created.

The constructed n-grams are used as the input to selected classifier 420or pseudo classifier 425 (step 825)

In other embodiments of the invention, the construction of n-grams maybe skipped (i.e. skip step 820) and the selected classifier 420 and/orpseudo classifier 425 may extract based on other techniques for exampleby using word spotting.

FIG. 9 illustrates a preferred embodiment of how the results of previousextractions can influence the selection of the next classifier420/pseudo classifier 425. The method of FIG. 9 may be included in step515. As mentioned above, in preferred embodiments of the presentinvention, the sequence of semantic category extractions ispreprogrammed. However, in the cases where there is more than oneclassifier 420 or more than one pseudo classifier 425 for a semanticcategory, the method of FIG. 9 allows the selection of appropriateclassifier 420 or pseudo classifier 425. In step 905, a decision is madeon whether more than one classifier 420 or pseudo classifier 425 isavailable for the semantic category to be extracted, for example bychecking the structure of real time database 435. If no, the oneavailable is employed (i.e. proceed directly to step 520). If yes, themethod continues with step 910 where a further decision is made whethera selection of less than all available classifier 420 or less than allavailable pseudo classifier 425 for the semantic category can be madebased on previous extractions. If a selection is possible, the selectionis made in step 915. Otherwise, results of the previous extractions areclarified in step 920, for example by dialoging with the submitter. Inother cases, in step 920, all available classifiers or all availablepseudo classifiers for the semantic category are used, for examplelooking back at FIG. 7 in step 710 all available parameter pseudoclassifiers are initially employed.

As an example, assume that the semantic categories include an overallcategory and a subcategory and also assume that there is more than onesubcategory classifier 420, with a different subcategory classifier 420used depending on the overall category value. If the results of theoverall category extraction are clear, then in step 915 the subcategoryclassifier 420 corresponding to the extracted overall category value isemployed. If the results of the overall category extraction are unclear,then clarification is received in step 920.

As another example, assume that there is an additional hierarchicallevel so that the (four) semantic categories include an overallcategory, a pre-subcategory, a subcategory and parameter values. In thisexample a different pre-subcategory classifier 420 is selected dependingon the overall category value and a different subcategory classifier 420is selected depending on the pre-subcategory value. Further addedhierarchical levels can be processed in a complementary manner.

As yet another example, assume that there are a plurality of parametervalue pseudo classifiers 425. Assume also that in step 745 (FIG. 7) itis found that a certain parameter type, for example a money expression,required by the extracted subcategory value is missing. In this case, aquestion is asked and once the answer from the submitter is received,only the currency pseudo classifier (corresponding to the missingparameter type) out of all pseudo classifiers 425 would be selected andemployed on the answer. As another example, assuming more than twoparameter value pseudo classifiers 425, if it is found in step 745 thata money expression and a time expression are missing, then currencypseudo classifier 425 and time pseudo classifier 425 out of all theavailable pseudo classifiers 425 would be employed on the answer.

FIG. 10 illustrates a preferred embodiment of a method for dialoguingwith the submitter of the natural language (step 545). Additional userinformation may be required for example, to resolve an ambiguity,provide a missing piece of information, or restate the submission. Theanswers received from the submitter via the dialog augments previousextraction results so as to aid in understanding the natural languagetext. The term augments is used to include one or more of the following:clarifies, supplements, pinpoints, expands, narrows, etc., i.e. theanswers from the dialog allows the text to be better understood than hadthe dialog not taken place and only the previous extraction results wereavailable.

If further processing can not be performed (step 1020 corresponding tostep 540 of FIG. 5, step 718, step 726 or 745 of FIG. 7, or step 920 ofFIG. 9) a dialog management module 440 is called by action resolver 410(step 1030). Ambiguity may result from more than one correctinterpretation of a submission (for example, a request for the flightschedules leaving New York City can be interpreted as leaving Kennedy orLa Guardia airport and more information from the submitter would berequired to resolve the ambiguity). Further processing may also not bepossible if information is missing (for example parameter values of allparameter types required by the subcategory value were not extracted).In some preferred embodiments dialog management module 440 searches realtime data base 445 for pre-defined questions associated with one or morecategories. In other preferred embodiments, dialog management module 440does not pose predefined questions, but instead formulates questions. Ineither case open questions and multiple choice questions may be usedaccording to the type of missing information In one preferredembodiment, ambiguity problems result in a close-ended question, whereasmissing information (for example values of missing categories) result inopen-ended questions. In step 1050, the question is posed to thesubmitter.

In preferred embodiments of the present invention, there is no need todesign in advance a dialog tree which covers all possible questions forall possible missing information/ambiguities. Instead dialog is createdon the fly or predefined question strings are retrieved from real timedatabase 445 based on system logic previously inserted in real time database 445. The decision as to what and when to ask is taken by dialogmanagement module 440 based on this system logic and the current step inthe understanding process of FIG. 5. The system logic should beunderstood to mean logic inputted into real time database 445 to aid inthe natural language understanding and which as a bonus also aids inquestion formulation/question string retrieval. For example, the logicfor a subcategory value can include the parameter types related to thatsubcategory value, characteristics of these parameter typesindependently and in relation to one another (such as when the types aremandatory), relative importance of each of these parameter types, etc.

In preferred embodiments of the present invention, the question posed tothe submitter is varied based on previous extraction results (whereresults in this context can also include non-results i.e. unsuccessfulextraction). For example when formulating a question, the previousextraction results can be compared to the logic in order to formulate anappropriate question. Continuing with the example of the previousparagraph if parameter values for two parameter types related to thesubcategory value are missing but the logic dictates that one type ismore important, then a first question formulated and posed to thesubmitter may relate only to the more important type. As anotherexample, a pre-defined question may include all required parameter typesfor a given subcategory according to the logic. However the questionstrings retrieved from database 445 and used in the question posed tothe submitter will relate only to those parameter types with nopreviously extracted parameter values. As another example, a pre-definedquestion may include all possible airports in New York State, but if theprevious extractions extracted New York City, the multiple-choicequestion posed to the submitter will be modified so as to offer aspossible responses only airports in New York City.

Real time database 445 includes entries for the semantic categories. Asmentioned above, real time database 445 in some preferred embodimentsalso includes dialog questions and/or suggested answers.

There will now be explained a structure for real time database 445,according to a preferred embodiment of the present invention, whichassumes a hierarchical structure to semantic categories. Refer to FIG.11 showing an entity-relationship (ER) diagram of database 445. Theentries in database 445 are divided into four types. The first typeincludes entries related to the semantic category “overall category”1105. The second type includes entries related to the semantic category“subcategory” 1115. Each overall category entry 1105 has a number ofsubcategory entries 1115. Each subcategory entry 1115 requires oraccepts certain parameter type entries 1125. Each parameter type entry1125 is able to take on one or more parameter value entries 1135. Theparameter value entries 1135 are related to the semantic category“parameter values”. The invention is not bound by the illustrated ERstructure or contents.

It should be noted that although parameter value entries 1135 are notstored under subcategory entries 1115 in the illustrated example ofdatabase 445, parameter values under a common conceptual view would beconsidered to be of a lower hierarchical level as explained above. Forconvenience, in the illustrated example of database 445, parameter valueentries 1135 in database 445 are not stored under subcategory entries1115 so that the same parameter value entry 1135 can relate to more thanone subcategory entry 1115. It should also be noted that in manypreferred embodiments, not all parameter values related to parametertype entries 1125 are stored as parameter value entries 1135 in database445. For example a particular subcategory entry 1115 may be associatedwith a parameter type entry 1125 (for example date) whose parametervalues are extracted by pseudo classifier 420 and therefore are notstored as entries in database 445.

The definition of the hierarchical structure of database 445 and thecharacterization of the entries into the different hierarchical levelsis typically performed by a system analyst with knowledge of therequirements of a particular implementation and is therefore beyond thescope of this invention.

In some preferred embodiments, entries for one or more categories aremanually entered in database 445. In other preferred embodiments,entries for one or more categories can be at least partiallyautomatically gathered from the Internet, preferably using activebrowsing studio work tool 170. In preferred embodiments includingrequest implementation through control of internet sites, this approachimplies that at least part of the information used in building data base445 originates from the medium where request implementation takes place.

FIG. 12 shows a method for training real time database 445 so as togenerate knowledge base 430 (used by classifiers 420), according to apreferred embodiment, which assumes the same semantic categories as inFIG. 11. The first step 1205 is the defining of natural languageexamples. The second step 1210, if required for some examples, is theembedding of syntactic tokens based on entries in real time database 445within the natural language examples. Tokens for example can includeoverall category entries 1105, subcategory entries 1115, parameter typeentries 1125 and/or parameter values entries 1135. Train database 450preferably includes some examples with embedded tokens and some exampleswithout embedded tokens (step 1212), so that classifiers 420 are trainedto understand both text which includes proper nouns (for example, Intel)and/or common nouns (for example, stocks). The next step 1215 is thetransformation of the examples into n-grams, preferably sparse, ifrequired (for example if it is expected that in operation the extractionwill be performed on n-grams). In some preferred embodiments, thetransformation is performed by pre-processing module 435. The examples,in the form of n-grams if required, are input into classifiers in 420step 1220. In one preferred embodiment examples with embedded tokenscorresponding to parameter type entries 1125 are used for trainingclassifiers 420 for the overall category and subcategory. In oneembodiment, examples with embedded tokens corresponding to parametervalue entries 1135 are used for training classifiers 420 for parametervalues. The classifiers are trained in step 1225. It should be evidentthat the same algorithm referenced above with respect to classifiers 420can be used in training step 1225. Knowledge-base 430 is obtained fromthe training in the final step 1230. In one preferred embodiment,knowledge-base 430 is a data structure that is saved to a disk soknowledgebase 430 can be used later.

In certain preferred embodiments, knowledge work tool 150 assists in thetraining process. For example, once an operator of the training haschosen an entry from database 445 and the placement of a token in anexample, tool 150 can develop a token from the entry and embed the tokenin the correct place. As an additional example, work tool 150 can employpre-processing module 435 for developing n-grams. In certain preferredembodiments, work tool 150 also operates the training of classifiers 420and saves the results in knowledgebase 430.

In some preferred embodiments, the training phase of knowledgebase 430is completely separated from the usage phase—i.e. all training isperformed prior to use of knowledgebase 430. In other preferredembodiments, knowledgebase 430 continues to be expanded during the usagephase, by learning from the texts received from submitters.

An example will now be given to further illustrate certain aspects ofthe overall process of FIG. 5 and the sub-processes of FIGS. 6 to 10.Assume a user submission of “price for Columbia yesterday” received instep 510. The submission in the example includes a request. The text isprepared for extraction in step 512 by constructing the following ngrams (step 815). In the example, it is assumed that the n grams aresparse and comply with the exemplary form defined above with referenceto FIG. 8:

(price for Columbia, True)

(for Columbia yesterday, True)

(price for Columbia, False)

(for Columbia yesterday, False)

(price for yesterday, False)

(price Columbia yesterday, False)

(price for, True)

(for Columbia, True)

(Columbia Yesterday, True)

(price for, False)

(for Columbia, False)

(Columbia Yesterday, False)

(price Columbia, False)

(price yesterday, False)

(for yesterday, False)

(price, True)

(for, True)

(Columbia, True)

(yesterday, True)

Action resolver 410 selects one classifier 420 to employ in step 515. Inthis case it is assumed that there are three types of classifiers 420:one overall category classifier (the domain of interest); a subcategoryclassifier corresponding to each overall category value (requestedoperation for the domain of interest); and one parameter valueclassifier (items required by operation). Parameter values classifier420 and all available parameter value pseudo classifier(s) 425 areemployed in step 520. The parameter values pseudo-classifier 425 whichis a time phrase extractor extracts one item: Feb. 6, 2001 i.e. the dateyesterday. The parameter type of Feb. 6, 2001 is identified as date. Theparameter values classifier 420 extracts items from the word “Columbia”along with the grades of the items (steps 610 and 620). Real timedatabase 445 is used to identify the parameter-types i.e. item-types ofthe different items extracted from the word Columbia. Assume that fiveitems are extracted corresponding to Columbia as a country, auniversity, a hospital, and twice as a stock. Due to the ambiguity, morethan one item and the corresponding item type are saved. The text isprepared for the next extraction in step 512, using all possibleitem-type matches. The syntactic tokens (in this example, the itemtypes) are embedded into the text in step 815. In this example, it isassumed that an identical item-type is embedded in only one restatementof the text, even if more than one item was found of the same item-type(in this example, the tokens supplement “Columbia”).

price for Columbia <parameter type:hospital> yesterday <date>

price for Columbia <parameter type:country> yesterday <date>

price for Columbia <parameter type:university> yesterday <date>

price for Columbia <parameter type:stock> yesterday <date>

Sparse n-grams are constructed in step 820 for each of the fouritem-type matches (which now include the embedded tokens). It should benoted that when constructing the sparse n-grams the embedded tokens aretreated as if the tokens are words and an integral part of the text.Action resolver 410 employs domain extractor 420 on the new sparsen-grams in step 825. Results are evaluated in step 525 (see method ofFIG. 6). It is assumed that two possible domains are outputted in step610. The first domain is hospital policies (i.e. prices for a stay atColumbia Hospital) and the second domain is Nasdaq (the market where thestocks for Columbia Records and Columbia Hospital are listed). It isassumed that the second domain is outputted with a higher grade in step620. However it is also assumed that the grades are close enough thataction resolver 410 decides to approach the submitter (step 545). Dialogmanagement module 440 is called (step 1030) which in this examplesearches real time database 445 for a question (step 1040). In thisexample, the question and answers (adapted to the two possible extracteddomains) are “Please clarify the topic of interest a) the hospitalpolicies of Columbia Hospital, b) The stock results of Columbia Hospitalor, c) The stock results of Columbia records. The question is posed tothe submitter in step 1050. The user response is assumed to be “hospitalstock” implying the stock results of Columbia Hospital (step 550), whichis inputted into domain classifier 420 (step 520) to extract Nasdaq

Therefore the operation extractor 420 related to Nasdaq is selected instep 915 and employed in step 520. The sparse n-grams earlier derivedfrom:

price for Columbia <parameter type:stock> yesterday <date> are inputtedinto the operation extractor 420 related to Nasdaq. The results of theoperation are “get stock price”. The operation is evaluated inconjunction with previous results in step 658. The operation “get stockprice” requires parameters of type stock and date. Both of these typeshave been extracted. As a final test a weighted grade is calculatedwhich is assumed to be sufficiently high (steps 660 and 665). Therequest is therefore assumed to have been correctly understood (i.e.sufficient results-step 670) and the results are output in step 530.

In order for the request to have been correctly understood, it isassumed that classifiers 420 had been previously trained. As an example,assume that the following examples were defined in step 1205:

“I want a price for <parameter type: stock> at <date:exact date>” Thisexample in which parameter-type tokens were embedded in step 1210 mayhave been transformed into n-grams in step 1215 and used to train domainclassifier 420 for the Nasdaq domain and/or operation classifier 420 forthe operation “stock quote” in step 1225.

“I want to trade with stocks” This example with no embedded tokens mayhave been transformed into n-grams in step 1215 and used to train domainclassifier 420 for the Nasdaq domain (and possibly other domain stockmarkets).

“University Columbia of New York” may have been used to train parametervalues classifier 420 for the item Columbia University of New York.

“Columbia Medical” may have been used to train parameter valuesclassifier 420 for Columbia Hospital corresponding to both hospitalparameter type and stock parameter type.

To further illustrate the flowcharts of FIGS. 6, 7, 8, 9, 10, and 12another comprehensive example is presented. In the example, there isassumed to be two possible overall category values (here domains),“financial information” and “car rentals”. Subcategory values (hereoperations) for “financial information” are “get stock quote”, “getstock rate of change”, “get stock high value”, and “get stock lowvalue”, each of which is associated with a parameter value of parametertype “stock”. There is assumed to be two subcategory values for “carrentals”, namely “get address of dealership” which is associated with aparameter value of parameter type location and “make a car rentalreservation” which is associated with parameter values of parametertypes “location”, “time”, and “car group, where two locations arerequired: pickup and return and two times are required: pickup time andreturn time. It is also assumed that parameter values of parameter type“stock” include Intel, Yahoo, Microsoft, AT&T, etc. Parameter values ofparameter type “location” as in Avis dealership location include LosAngeles airport, Los Angeles downtown, San Francisco, Sacramento, etc.No specific time parameter values are specified for the “time” parametertype. Parameter values of parameter type “car group” as in rental cargroup include compact, sub compact, sports, 2-door, etc.

The table below summarizes the scope of the example:

Overall Category ParameterTypes (domain) Subcategory (operation)(arguments) Financial Get Stock Quote Stock Information Get Stock RateOf Change Stock Get Stock High Value Stock Get Stock Low Value Stock CarRentals Get address of dealership Location Make Car Rental ReservationLocation (pickup) Location (return) Time (pickup) Time (return) CarGroup

ParameterType ParameterValues Stock Intel, Yahoo, Microsoft, AT&T, . . .Location (Avis Dealership) LA Airport, LA Downtown, San Francisco,Sacramento, . . . Time No specific Items Car Group (rental) Compact, SubCompact, Sports, 2-Door, . . .

Referring to FIG. 6, assume that the text in this example is the request“get a quote for Intel”. In steps 610 to 640, the parameter valuesextracted by parameter value classifier 420 and/or parameter valuespseudo classifier 425 are output. In this example, only one parametervalue “Intel” is extracted. In step 645, as this is the first semanticcategory extracted, the results are insufficient.

Assume that the overall category classifier 420 is then called andapplied to n-grams created from a restatement of the original text whichincludes a token based on the result of the parameter value extraction,i.e. “Get a quote for <ParameterType Stock>” (in this example the tokenreplaces “Intel”). In steps 610 to 640, the outputted results of overallcategory classifier 420 are the two possible domains, with financialInformation receiving a high grade and car rentals a low grade. Theresults are sorted by grade in step 640 and in step 655, the results areevaluated in conjunction with the parameter value results. As thesubcategory value is still unknown, the results are consideredinsufficient.

Assume then that the subcategory classifier 420 corresponding to overallcategory value “financial information” is called in steps 610 to 658.The results include the operation with the highest grade, assumed to be“Get Stock Quote”. The results are checked for compliance with previousresults. The evaluation shows that the highest graded operation is amember of the found domain and that the found parameter value is of atype accepted by the found operation as an argument. In step 660 aweighted grade corresponding to the highest graded operation iscalculated by a simple formula giving equal weights to each semanticcategory and the weighted grade is checked to see whether the weightedgrade is above a given threshold. If the weighted grade is below thethreshold, in step 675 evaluation can be attempted for other sets ofresults with lower grades (for example including a lower gradedoperation), and it can be checked whether the resulting weighted gradeis higher than the given threshold.

Referring to FIG. 7, assume that the text in the example is instead therequest “rent a car tomorrow morning in LA airport until March 13th atnoon, return to Sacramento”

In step 702 the text is preprocessed into n-grams because in thisexample it is assumed that n-grams are inputted to classifiers 420and/or pseudo classifiers 425. The n-grams are of the sparse formdescribed above with reference to FIG. 8.

In steps 705 to 715 parameter values classifier 420 and pseudoclassifiers 425 are initially called. The extracted parameter valuesinclude several values: LA Airport, Feb. 8, 2001 08:00 (Tomorrow'sdate), Mar. 13, 2001 12:00, and Sacramento. The text is restated so asto include tokens based on the found parameter values, namely: “rent acar <ParameterType : Time> in <ParameterType : AvisDealershipLocation>until <ParameterType : Time> return to <ParameterType :AvisDealershipLocation>. New n-Grams are created from the restated textagain using the sparse n-gram form described above with the embeddedtokens treated as words. Overall category classifier 420 is called andextracts the car rentals domain.

In step 718 because the overall category was unambiguously found themethod proceeds with step 722. (If there had been ambiguity with regardto the domain, dialoging with the user could take the form of posing aclosed multiple choice question to the submitter which includes the twopossible domains as choices.)

In steps 722 to 725, subcategory classifier 420 is called. Firstevaluation static component 425 is then called in order to try to find amatch between the parameter types of the found parameter values and theexpected arguments of the highest graded extracted operation. In thisexample, because the request text is clear regarding the desiredoperation, subcategory classifier 420 returns only one operation. Staticevaluation component 425 matches the parameter types “Time” and“AvisDealershipLocation” corresponding to the extracted parameter valueswith the corresponding arguments of the “Make Car Rental Reservation”operation

Because the subcategory value was unambiguously found, no dialoging isrequired and the method proceeds with step 735 (If there had beenambiguity, a typical multiple choice question could display as choicesall available operations for the found domain or all operations for thefound domain which received a high grade from subcategory classifier420)

In steps 735 to 740 because there are parameter types which areacceptable for more than one argument of the found operation, there is aneed to call second static evaluation component (Relational StaticComponent) 425. In this example both Time and AvisDealershipLocation aretwice accepted as arguments by the operation “make car rentalreservation”. Relational static component 425 identifies which valuesbelong to which arguments by checking the context of the values. Thetime value Mar. 13, 2001 12:00 is recognized as the return time due tothe preceding word “until”, and the value Sacramento is recognized asthe return location by the preceding words “return to”. Once thesevalues are assigned correctly to the arguments of the operation theother time and AvisDealershiplocation values follow naturally.

In steps 742 to 752 third static component 425 is called to check if allrequired arguments have been assigned suitable values. In this examplethird static component 425 finds that four out of the five argumentshave values assigned. The car group argument is as yet unassigned.Therefore in step 750 in a dialog with the submitter either an openquestion is posed to prompt the submitter to enter the car group or aclosed question is posed including as choice all possible car groups (aspredefined). Once the answer is received, the last required parameter isknown and results can be output.

Referring to FIG. 8 it is assumed that the text is the same request asin FIG. 7, namely “rent a car tomorrow morning in LA airport until March13th at noon, return to Sacramento”. Step 810 checks if there are anyprevious results that can be developed into tokens. In this exampletokens for parameter types Time and AvisDealershipLocation whichcorrespond to the extracted parameter values can be developed. A tokenis developed for each text part that had been used as a source forextraction of a parameter value.

In step 815 the developed tokens are embedded in the text in place ofthe source texts that were used to extract the parameter values. In thisexample the restatement of the original request “rent a car tomorrowmorning in LA airport until March 13th at noon, return to Sacramento” isrestated as “rent a car <ParameterType: Time> in <ParameterType :AvisDealershipLocation> until <ParameterType : Time> return to<ParameterType : AvisDealershipLocation>”.

In steps 820 to 825 from the restatement, new n-grams are constructed inthe sparse n gram form described above with reference to FIG. 8. Some ofthe n-Grams include tokens, which are dealt with as regular words.

Refer now to FIG. 9. Again assume the text is the request “rent a cartomorrow morning in LA airport until March 13th at noon, return toSacramento” In step 905 the answer to the question is yes when decidingwhether there is more than one possible subcategory classifier 420 whichcan be called. In this example, a selection needs to be made from amongthe two possible subcategory classifiers 420, one that classifiesoperations for the Financial Information domain and one that classifiesoperations for the Car Rentals domain.

In steps 910 to 920 because the domain “car rentals” is assumed to havealready been found, the car rentals subcategory classifier 420 is used.(If after using the overall category classifier there is still ambiguitywith regard to the correct domain, dialoging in step 920 would beattempted to clarify the correct domain)

Refer to FIG. 10. Assume now that there are two texts received from thesubmitter the first text being “LA Airport to Sacramento, tomorrowmorning until 13/3/2001 at noon” and the second text being “Intel”.

In step 1020 redundant interactions with the submitter are avoided byperforming additional automatic processing to try to solve any problemswithout the help of the submitter. Assume that the parameter values (LAairport, Sacramento, Feb. 8, 2001 08.00, and March 13, 12.00) and domain(car rentals) have been extracted from the first text. Although theoperation is not given in the first text, further processing can beperformed using the first static evaluation component 420 in order todetermine the desired operation by looking at the parameter types of theextracted parameter values and comparing these parameter types with thepossible accepted arguments of the available operations, therebyavoiding dialoging. However, after calling second and third staticcomponents 420 the car group value is still missing and so dialogingwith the submitter is required to obtain the car group value. Referringnow at the second text, both the overall category and the subcategorycan not be extracted based on the text alone. However, the overallcategory can be extracted from a restatement which includes a tokenbased on an extracted parameter value, i.e. (<parameterType : Stock>).This restatement implies that the desired operation to be found acceptsthe stock parameter type as an argument. In this example, onlyoperations in the financial information domain (and not in the carrental domain) receive such values. Therefore the domain can bedetermined without dialoging. However, after calling the subcategoryclassifier, the operation is still ambiguous because all four operationsin this domain accept stock as an argument. Therefore dialoging with thesubmitter is required to allow the submitter to select the correctoperation.

In steps 1030 to 1050 dialog management module 440 is called if nofurther processing is possible. Dialog management module 440 generatesthe correct interaction based on the current status of the handling ofthe request. If dialog management 440 is called while processing thefirst text to determine the car group value, dialog module 440 needs tocreate an interaction for determining the car group parameter value.Therefore dialog module 440 goes to real time database 445 and finds thestring that was prepared as a question for this case specifically,i.e.—a question regarding the lack of value for this specific argument.If dialog module 440 is called for the second text in order to determinethe operation, dialog module 440 needs to create an interaction thatclarifies an ambiguity in the operation and presents the submitter withall possible options. Therefore, dialog module 440 goes to real timedatabase 445 and finds the String that was prepared for this specificcase, i.e.—operation ambiguity interaction. Once the question isformatted, the question is transferred to the submitter and the reply ofthe submitter is analyzed.

Refer to FIG. 12. In this example, the initial creation of knowledgebase 430 includes the following steps. In step 1205, natural languageexamples are defined for the supported domains, operations and parametervalues. For example, the following examples may be used, inter-alia fortraining:

“I want to receive financial information”

Domain: Financials

“I want to get a stock quote”

Operation: Get Stock Quote

“I would like to rent a car”

Domain: Car Rentals

“I would like to rent a car”

Operation: Make Car Rental Reservation

“Intel”

Parameter value : Intel

“Los Angeles Airport”

Parameter value: LA Airport

In step 1210 tokens are embedded in some of the above examples. Forexample:

“I want to get a stock quote for <ParameterType : Stock>”

Operation: Get Stock Quote

“I would like to rent a car <ParameterType : Time> in <ParameterTypeAvisDealershipLocation>”

Domain: Car Rentals

In steps 1212 to 1230 the training examples are turned into n-Grams, andthe classifiers are trained on the n-grams, with the results serializedinto Knowledgebase 430. Typically, the training process isclassifier-specific allowing the examples in their n-gram representationto be associated with the categories and values which were trained onthose n-grams.

It will also be understood that the system according to the inventionmay be a suitably programmed computer. Likewise, the inventioncontemplates a computer program being readable by a computer forexecuting the method of the invention. The invention furthercontemplates a machine-readable memory tangibly embodying a program ofinstructions executable by the machine for executing the method of theinvention.

While the invention has been described with respect to a limited numberof embodiments, it will be appreciated that many variations,modifications and other applications of the invention may be made.

1. A method for use in a method for understanding a natural languagetext, comprising performing the following selectively in a statisticalmanner: attempting to extract at least one value belonging to a semanticcategory from a natural language text or a form thereof; and if a resultof said attempting complies with a predetermined criterion, attemptingto extract, based on said result, at least one value belonging toanother semantic category of a different hierarchical level than saidsemantic category, else performing at least one action from a group ofactions including: asking a submitter of said text a question whosecontent depends on said result and giving up on understanding saidnatural language text.
 2. The method of claim 1, wherein said attemptingto extract at least one value belonging to said another semanticcategory includes: selecting at least one classifier or pseudoclassifier for said another semantic category from among more than saidat least one classifier or pseudo classifier for said another semanticcategory, wherein said selecting is based on at least one extractedvalue belonging to said semantic category; and employing said at leastone classifier or pseudo classifier in an attempt to extract at leastone value belonging to said another semantic category.
 3. The method ofclaim 2, wherein said another semantic category is a hierarchicallylower level semantic category than said semantic category.
 4. The methodof claim 3, wherein said at least one value belonging to said anothersemantic category is at least one operation and said at least one valuebelonging to said semantic category is at least one domain.
 5. Themethod of claim 1, wherein said form are n grams constructed from saidtext or from a restatement of said text which includes at least oneembedded token.
 6. The method of claim 5, wherein said n-grams aresparse n-grams.
 7. The method of claim 1, wherein said asking a questionincludes: formulating said content of said question on the fly based onsaid result.
 8. The method of claim 1, wherein said asking a questionincludes: modifying a predefined question based on said result.
 9. Themethod of claim 1, further comprising: if said question is asked,attempting to extract at least one value belonging to a previouslyextracted semantic category from said answer.
 10. The method of claim 9,wherein said previously extracted semantic category is said semanticcategory.
 11. The method of claim 1, wherein said at least one valuebelonging to a semantic category and said at least one value belongingto another semantic category are at least one from a group including: atleast one domain, at least one operation, and at least one parametervalue.
 12. The method of claim 1, wherein said attempting to extractincludes employing at least one pseudo classifier in order to attempt toextract at least one parameter value.
 13. The method of claim 12,wherein at least one of said pseudo classifiers is a time extractor andsaid employing includes employing said time extractor in order toattempt to extract at least one time expression.
 14. The method of claim12, wherein at least one of said pseudo classifiers is a currencyextractor and said employing includes employing said currency extractorin order to attempt to extract at least one currency expression.
 15. Themethod of claim 12, wherein at least one of said pseudo classifiers isemployed on a part of said natural language text or a form thereof. 16.The method of claim 1, wherein said content is based on a comparisonbetween said result and predetermined logic related to said semanticcategory.
 17. The method of claim 16, wherein said logic was devised toassist in understanding said natural language text.
 18. The method ofclaim 1, wherein said question relates to less than all non-extractedvalues, based on relative importance.
 19. A program storage devicereadable by machine, tangibly embodying a program of instructionsexecutable by the machine to perform method steps for use in a methodfor understanding a natural language text, comprising performing thefollowing selectively in a statistical manner: attempting to extract atleast one value belonging to a semantic category from a natural languagetext or a form thereof; and if a result of said attempting complies witha predetermined criterion, attempting to extract, based on said result,at least one value belonging to another semantic category of a differenthierarchical level than said semantic category, else performing at leastone action from a group of actions including: asking a submitter of saidtext a question whose content depends on said result and giving up onunderstanding said natural language text.
 20. A computer program productcomprising a computer useable medium having computer readable programcode embodied therein for use in a computer program product comprising:computer readable program code for causing the computer to perform thefollowing selectively in a statistical manner: computer readable programcode for causing the computer to attempt to extract at least one valuebelonging to a semantic category from a natural language text or a formthereof; and computer readable program code for causing the computer, ifa result of said attempting complies with a predetermined criterion, toattempt to extract, based on said result, at least one value belongingto another semantic category of a different hierarchical level than saidsemantic category, else to perform at least one action from a group ofactions including: asking a submitter of said text a question whosecontent depends on said result and giving up on understanding saidnatural language text.